Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Zeroth-order methods are extensively used in machine learning applications where gradients are infeasible or expensive to compute, such as black-box attacks, reinforcement learning, and language model fine-tuning. Existing optimization theory focuses on convergence to an arbitrary stationary point, but less is known about the implicit regularization that provides a fine-grained characterization of which particular solutions are reached. This paper shows that zeroth-order optimization with the standard two-point estimator favors solutions with small trace of Hessian, a measure widely used to distinguish between sharp and flat minima. The authors provide convergence rates of zeroth-order optimization to approximate flat minima for convex and sufficiently smooth functions, defining flat minima as minimizers that achieve the smallest trace of Hessian among all optimal solutions. Experiments on binary classification tasks with convex losses and language model fine-tuning support the theoretical findings.more » « lessFree, publicly-accessible full text available June 5, 2026
An official website of the United States government

Full Text Available